Overview

Dataset statistics

Number of variables9
Number of observations17000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.2 MiB
Average record size in memory72.0 B

Variable types

NUM9

Reproduction

Analysis started2020-06-05 19:42:14.742353
Analysis finished2020-06-05 19:42:28.878704
Duration14.14 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

latitude is highly correlated with longitudeHigh correlation
longitude is highly correlated with latitudeHigh correlation
total_bedrooms is highly correlated with total_rooms and 1 other fieldsHigh correlation
total_rooms is highly correlated with total_bedrooms and 1 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with householdsHigh correlation

Variables

longitude
Real number (ℝ)

HIGH CORRELATION

Distinct count827
Unique (%)4.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-119.5621082352941
Minimum-124.35
Maximum-114.31
Zeros0
Zeros (%)0.0%
Memory size132.8 KiB

Quantile statistics

Minimum-124.35
5-th percentile-122.47
Q1-121.79
median-118.49
Q3-118
95-th percentile-117.07
Maximum-114.31
Range10.04
Interquartile range (IQR)3.79

Descriptive statistics

Standard deviation2.005166408
Coefficient of variation (CV)-0.0167709188
Kurtosis-1.322329668
Mean-119.5621082
Median Absolute Deviation (MAD)1.28
Skewness-0.3040029768
Sum-2032555.84
Variance4.020692325
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-118.311360.8%
 
-118.31280.8%
 
-118.321240.7%
 
-118.291180.7%
 
-118.351160.7%
 
-118.361150.7%
 
-118.271140.7%
 
-118.281130.7%
 
-118.371110.7%
 
-118.191100.6%
 
Other values (817)1581593.0%
 
ValueCountFrequency (%) 
-124.351< 0.1%
 
-124.32< 0.1%
 
-124.271< 0.1%
 
-124.261< 0.1%
 
-124.251< 0.1%
 
ValueCountFrequency (%) 
-114.311< 0.1%
 
-114.471< 0.1%
 
-114.561< 0.1%
 
-114.572< 0.1%
 
-114.582< 0.1%
 

latitude
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count840
Unique (%)4.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.62522470588235
Minimum32.54
Maximum41.95
Zeros0
Zeros (%)0.0%
Memory size132.8 KiB

Quantile statistics

Minimum32.54
5-th percentile32.82
Q133.93
median34.25
Q337.72
95-th percentile38.96
Maximum41.95
Range9.41
Interquartile range (IQR)3.79

Descriptive statistics

Standard deviation2.137339795
Coefficient of variation (CV)0.05999512459
Kurtosis-1.112226493
Mean35.62522471
Median Absolute Deviation (MAD)1.2
Skewness0.4718011204
Sum605628.82
Variance4.568221398
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
34.062051.2%
 
34.082001.2%
 
34.051961.2%
 
34.071941.1%
 
34.041881.1%
 
34.091781.0%
 
34.11711.0%
 
34.021691.0%
 
34.031621.0%
 
33.941460.9%
 
Other values (830)1519189.4%
 
ValueCountFrequency (%) 
32.541< 0.1%
 
32.553< 0.1%
 
32.5690.1%
 
32.57130.1%
 
32.58200.1%
 
ValueCountFrequency (%) 
41.952< 0.1%
 
41.881< 0.1%
 
41.863< 0.1%
 
41.841< 0.1%
 
41.821< 0.1%
 

housing_median_age
Real number (ℝ≥0)

Distinct count52
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.58935294117647
Minimum1.0
Maximum52.0
Zeros0
Zeros (%)0.0%
Memory size132.8 KiB

Quantile statistics

Minimum1
5-th percentile8
Q118
median29
Q337
95-th percentile52
Maximum52
Range51
Interquartile range (IQR)19

Descriptive statistics

Standard deviation12.58693698
Coefficient of variation (CV)0.4402665918
Kurtosis-0.8008262247
Mean28.58935294
Median Absolute Deviation (MAD)10
Skewness0.06489403293
Sum486019
Variance158.4309826
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5210526.2%
 
367154.2%
 
356924.1%
 
166353.7%
 
175763.4%
 
345673.3%
 
335133.0%
 
265033.0%
 
184782.8%
 
254612.7%
 
Other values (42)1080863.6%
 
ValueCountFrequency (%) 
12< 0.1%
 
2490.3%
 
3460.3%
 
41610.9%
 
51991.2%
 
ValueCountFrequency (%) 
5210526.2%
 
51320.2%
 
501120.7%
 
491110.7%
 
481350.8%
 

total_rooms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count5533
Unique (%)32.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2643.664411764706
Minimum2.0
Maximum37937.0
Zeros0
Zeros (%)0.0%
Memory size132.8 KiB

Quantile statistics

Minimum2
5-th percentile626.95
Q11462
median2127
Q33151.25
95-th percentile6269.05
Maximum37937
Range37935
Interquartile range (IQR)1689.25

Descriptive statistics

Standard deviation2179.947071
Coefficient of variation (CV)0.8245929634
Kurtosis29.51588478
Mean2643.664412
Median Absolute Deviation (MAD)792
Skewness4.002729999
Sum44942295
Variance4752169.234
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1582160.1%
 
1527150.1%
 
1717140.1%
 
1471140.1%
 
1703140.1%
 
1613130.1%
 
2053130.1%
 
1724130.1%
 
1875120.1%
 
2017120.1%
 
Other values (5523)1686499.2%
 
ValueCountFrequency (%) 
21< 0.1%
 
81< 0.1%
 
111< 0.1%
 
121< 0.1%
 
152< 0.1%
 
ValueCountFrequency (%) 
379371< 0.1%
 
326271< 0.1%
 
320541< 0.1%
 
304051< 0.1%
 
304011< 0.1%
 

total_bedrooms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count1848
Unique (%)10.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean539.4108235294118
Minimum1.0
Maximum6445.0
Zeros0
Zeros (%)0.0%
Memory size132.8 KiB

Quantile statistics

Minimum1
5-th percentile138
Q1297
median434
Q3648.25
95-th percentile1283
Maximum6445
Range6444
Interquartile range (IQR)351.25

Descriptive statistics

Standard deviation421.4994516
Coefficient of variation (CV)0.7814071079
Kurtosis19.69275009
Mean539.4108235
Median Absolute Deviation (MAD)162
Skewness3.322636716
Sum9169984
Variance177661.7877
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
280480.3%
 
309440.3%
 
394430.3%
 
331430.3%
 
345430.3%
 
343430.3%
 
340410.2%
 
290410.2%
 
322410.2%
 
272410.2%
 
Other values (1838)1657297.5%
 
ValueCountFrequency (%) 
11< 0.1%
 
21< 0.1%
 
34< 0.1%
 
46< 0.1%
 
54< 0.1%
 
ValueCountFrequency (%) 
64451< 0.1%
 
54711< 0.1%
 
52901< 0.1%
 
49571< 0.1%
 
49521< 0.1%
 

population
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count3683
Unique (%)21.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1429.5739411764705
Minimum3.0
Maximum35682.0
Zeros0
Zeros (%)0.0%
Memory size132.8 KiB

Quantile statistics

Minimum3
5-th percentile350.95
Q1790
median1167
Q31721
95-th percentile3297.05
Maximum35682
Range35679
Interquartile range (IQR)931

Descriptive statistics

Standard deviation1147.852959
Coefficient of variation (CV)0.8029336057
Kurtosis80.86199702
Mean1429.573941
Median Absolute Deviation (MAD)437.5
Skewness5.187211878
Sum24302757
Variance1317566.416
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
891230.1%
 
1052220.1%
 
1227190.1%
 
926190.1%
 
761190.1%
 
810190.1%
 
850190.1%
 
735180.1%
 
1056180.1%
 
781180.1%
 
Other values (3673)1680698.9%
 
ValueCountFrequency (%) 
31< 0.1%
 
61< 0.1%
 
82< 0.1%
 
92< 0.1%
 
111< 0.1%
 
ValueCountFrequency (%) 
356821< 0.1%
 
285661< 0.1%
 
161221< 0.1%
 
155071< 0.1%
 
150371< 0.1%
 

households
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count1740
Unique (%)10.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean501.2219411764706
Minimum1.0
Maximum6082.0
Zeros0
Zeros (%)0.0%
Memory size132.8 KiB

Quantile statistics

Minimum1
5-th percentile126
Q1282
median409
Q3605.25
95-th percentile1172.1
Maximum6082
Range6081
Interquartile range (IQR)323.25

Descriptive statistics

Standard deviation384.5208409
Coefficient of variation (CV)0.7671668163
Kurtosis20.69264455
Mean501.2219412
Median Absolute Deviation (MAD)150
Skewness3.342668363
Sum8520773
Variance147856.2771
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
306480.3%
 
386480.3%
 
282470.3%
 
330460.3%
 
426450.3%
 
380440.3%
 
335440.3%
 
284430.3%
 
316430.3%
 
329430.3%
 
Other values (1730)1654997.3%
 
ValueCountFrequency (%) 
11< 0.1%
 
22< 0.1%
 
32< 0.1%
 
44< 0.1%
 
57< 0.1%
 
ValueCountFrequency (%) 
60821< 0.1%
 
51891< 0.1%
 
50501< 0.1%
 
47691< 0.1%
 
46161< 0.1%
 

median_income
Real number (ℝ≥0)

Distinct count11175
Unique (%)65.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.8835781000000007
Minimum0.4999
Maximum15.0001
Zeros0
Zeros (%)0.0%
Memory size132.8 KiB

Quantile statistics

Minimum0.4999
5-th percentile1.603395
Q12.566375
median3.5446
Q34.767
95-th percentile7.36447
Maximum15.0001
Range14.5002
Interquartile range (IQR)2.200625

Descriptive statistics

Standard deviation1.908156518
Coefficient of variation (CV)0.4913398081
Kurtosis4.76414493
Mean3.8835781
Median Absolute Deviation (MAD)1.07405
Skewness1.626693098
Sum66020.8277
Variance3.641061299
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3.125410.2%
 
4.125390.2%
 
2.875390.2%
 
15.0001380.2%
 
2.625360.2%
 
3.875330.2%
 
3.625310.2%
 
3310.2%
 
4.375300.2%
 
3.375280.2%
 
Other values (11165)1665498.0%
 
ValueCountFrequency (%) 
0.4999110.1%
 
0.5367< 0.1%
 
0.64331< 0.1%
 
0.67751< 0.1%
 
0.68251< 0.1%
 
ValueCountFrequency (%) 
15.0001380.2%
 
151< 0.1%
 
14.90091< 0.1%
 
14.58331< 0.1%
 
14.42191< 0.1%
 

median_house_value
Real number (ℝ≥0)

Distinct count3694
Unique (%)21.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean207300.91235294117
Minimum14999.0
Maximum500001.0
Zeros0
Zeros (%)0.0%
Memory size132.8 KiB

Quantile statistics

Minimum14999
5-th percentile66000
Q1119400
median180400
Q3265000
95-th percentile495500
Maximum500001
Range485002
Interquartile range (IQR)145600

Descriptive statistics

Standard deviation115983.7644
Coefficient of variation (CV)0.5594947126
Kurtosis0.3039975986
Mean207300.9124
Median Absolute Deviation (MAD)68800
Skewness0.9730366335
Sum3524115510
Variance1.34522336e+10
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5000018144.8%
 
137500950.6%
 
162500890.5%
 
112500850.5%
 
187500740.4%
 
225000730.4%
 
87500640.4%
 
350000640.4%
 
150000520.3%
 
67500510.3%
 
Other values (3684)1553991.4%
 
ValueCountFrequency (%) 
149994< 0.1%
 
175001< 0.1%
 
225003< 0.1%
 
250001< 0.1%
 
266001< 0.1%
 
ValueCountFrequency (%) 
5000018144.8%
 
500000220.1%
 
4991001< 0.1%
 
4990001< 0.1%
 
4988001< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_value
0-114.3134.1915.05612.01283.01015.0472.01.493666900.0
1-114.4734.4019.07650.01901.01129.0463.01.820080100.0
2-114.5633.6917.0720.0174.0333.0117.01.650985700.0
3-114.5733.6414.01501.0337.0515.0226.03.191773400.0
4-114.5733.5720.01454.0326.0624.0262.01.925065500.0
5-114.5833.6329.01387.0236.0671.0239.03.343874000.0
6-114.5833.6125.02907.0680.01841.0633.02.676882400.0
7-114.5934.8341.0812.0168.0375.0158.01.708348500.0
8-114.5933.6134.04789.01175.03134.01056.02.178258400.0
9-114.6034.8346.01497.0309.0787.0271.02.190848100.0

Last rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_value
16990-124.2241.7328.03003.0699.01530.0653.01.703878300.0
16991-124.2341.7511.03159.0616.01343.0479.02.480573200.0
16992-124.2340.8152.01112.0209.0544.0172.03.346250800.0
16993-124.2340.5452.02694.0453.01152.0435.03.0806106700.0
16994-124.2540.2832.01430.0419.0434.0187.01.941776100.0
16995-124.2640.5852.02217.0394.0907.0369.02.3571111400.0
16996-124.2740.6936.02349.0528.01194.0465.02.517979000.0
16997-124.3041.8417.02677.0531.01244.0456.03.0313103600.0
16998-124.3041.8019.02672.0552.01298.0478.01.979785800.0
16999-124.3540.5452.01820.0300.0806.0270.03.014794600.0